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ABSTRACT 



The structure and dynamics of a typical biological system are complex due to strong 
and inhomogeneous interactions between its constituents. The investigation of such systems 
with classical mathematical tools, such as differential equations for their dynamics, is not 
always suitable. The graph theoretical models may serve as a rough but powerful tool in 
such cases. 

In this thesis, I first consider the network modeling for the representation of the biological 
systems. Both the topological and dynamical investigation tools are developed and applied 
to the various model networks. In particular, the attractor features' scaling with system 
size and distributions are explored for model networks. Moreover, the theoretical robustness 
expressions are discussed and computational studies are done for confirmation. 

The main biological research in this thesis is to investigate the transcriptional regulation 
of gene expression with synchronously and deterministically updated Boolean network mod- 
els. I explore the attractor structure and the robustness of the known interaction network of 
the yeast, Saccharomyces Cerevisiae and compare with the model networks. Furthermore, I 
discuss a recent model claiming a possible root to the topology of the yeast's gene regulation 
network and investigate this model dynamically. 

The thesis also included another study which investigates a relation between folding ki- 
netics with a new network representation, namely, the incompatibility network of a protein's 
native structure. I showed that the conventional topological aspects of these networks are 
not statistically correlated with the phi-values, for the limited data that is available. 
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OZETQE 



Tipik bir biyolojik sistemin yapisi ve dinamigi, ogelerinin birbirleri ile homojen olmayan 
ve giiglii etkile§imleri sebebiyle karma§iktir. Dinamik incelemelerde kullamlan tiirevli den- 
klemler gibi klasik diyebilecegimiz matematiksel yontemler, bu tiir karmasik sistemlerin 
incelenmesinde her zaman uygun olmayabilir. Qizge kuramsal modeller ise daha yiizeysel 
olsa da bu tiir sistemlerin incelenmesi igin daha etkili bir yontem olabilir. 

Bu tezde, ilk olarak biyolojik sistemlerin sunumu igin ag modellemesi ele aldim. Topolo- 
jik ve dinamik inceleme araclari gelistirilip ge§itli model aglara uyarlandi. Ozelde, model 
aglar igin gekici ozelliklerinin sistem buyuklugii ile olgeklenmesi ve dagihmlan incelendi. 
Ayrica, kuramsal dayamkhhk ifadeleri tartisilhp ve hesaplamah olarak dogruluklari sinandi. 

Bu tezdeki ana biyoloji arastirmasi, transkripsiyonel gen ifadesinin diizenlenmesinin 
e§zamanh ve deterministik giincellenen Boolyan ag modeli ile incelenmesi olmu§tur. Etk- 
ile§im agi bilinen maya, Saccharomices Cerevisiae 'nm gekici yapisim ve dayamkhhgim in- 
celedim ve model aglar ile kar§da§tirdim. Ayrica, mayamn gen ifadesi agmm topolojik 
muhtemel temellerini irdeleyen yeni bir modeli tarti§dim ve bu modeli dinamik olarak in- 
celedim. 

Bu tezde ayrica bir ba§ka ag modellenmesi olan; asil protein yapismdan elde edilen 
bagdasmaz (incompatibility) ag ile protein kinetiginin incelenmesi yer almaktadir. Elim- 
izdeki simrh veri ile yapilan smamalarda geleneksel olarak kullamlan belirli topolojik ozelliklcr 
ile fi-degerleri arasmda bagmti olmadigim gosterdim. 
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PROLOGUE 



I remember my scientific interest related to biology at first started in my second year in 
physics undergraduate. I was very enthusiastic about arranging a sort of scientific article 
reading group with my classmates at that time. For the first of reading I was searching 
through the internet for a scientific article that is not very complicated so that our education 
let us understand the concepts. By luck I found an articlqj about the biology of human 
hearing and its mathematical modeling. I was impressed by this marvelous organization in 
the ear and made the article be our initiator reading-piece. 

Unfortunately, this reading group did not gather for the second time but helped me 
understand deeply two important things. The first is that it is not easy and recommendable 
to do something "social" with the physicists. The second and more related to this context 
is that biology is not scary as I used to consider, on the contrary, it seems to encompass 
many bright inquiries about the nature. 

In the following period up to the last grade in undergraduate, my interest in biology 
had increased gradually. I remember some of my popular scientific readings at that period: 
Schrodinger's book "What is Life?", Watson's book "Double Helix", Dawkins' book "Selfish 
Gene", etc.. I had taken some courses related to biology and ecology. At last grade I had 
already been sure to pursue in the life sciences in academy. 

Then in September 2005, I started my master degree in Computational Sciences and 
Engineering program at Kog University. This program has let me appreciate some necessary 
knowledge about computation and given a chance to do research in biology. In particular, 
I have investigated the protein folding problem and the transcriptional gene regulation in 
my thesis. And now I introduce you my studies and research throughout this thesis. 

But before passing to the thesis, at this moment, let me ask myself some very basic 
questions and present the answers so that you can see where I am standing and where I am 

1 Unfortunately, I do not remember exact reference. 



looking through the biology. 

- What is Biology? 

For me, the biology is a branch of science which tries to answer the questions gathered 
around a very philosophical question: "What is life?" 

- How did this branch of science emerge, what is its history in brief? 

I think the history of human knowledge on biology has no starting time since every 
species must have some information regarding their own and other organisms for surviving 
in evolutionary period. However, our knowledge on this history at the preliterate time 
is suspicious and mainly relies on the guesses. If we want to have a more concrete idea 
emerging from evidences or documents we should go back to Egypt at the era of 3500-1500 
BC (at least for the western history). But, for the sake of the reader's wonder it might 
be worthy to mention that people of preliterate ages were able to classify the animals and 
plants, to say which plants are/are not toxic, are suitable for some basic medical purposes. 

We know that starting from Egypt (3500-1500 BC) the biological knowledge had in- 
creased in developed civilizations of the time but merely due to practical needs, i.e. anatomy 
and agriculture. Also, it was very mixed with mystery, magic and superstition. As long as I 
know, the first examples of what today we call "the scientific studies" (more abstract ques- 
tions) regarding the biology belong to ancient Greece. Later, while we see a progress in the 
knowledge of living things in Arab-Islam civilizations and in Europe of Renaissance I think 
the actual roots of today's biology belong to Mendel's genetics and Darwin's evolutions . 

When we come to the 20th century with the development in technology, we witness 
many improvements in biology, such as exploring the DNA as a genetic material. Today, 
we even decipher the genetic code of many species and possess tremendous data. With an 
analogy to physics, biology seems to stand at the point of physics at the beginning of 20th 
century. 

- How about the methodology of biology? 

As a physicist by training, biology seems to be an almost pure empirical discipline to 
me, however, especially for the last decades there has been a tremendous flow of scientists 

2 See References [T|[2| for further readings. 
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into the life sciences from other disciplines whose methodology depends mainly upon the 
theoretical studies. Therefore, we can interpret this as the methodology of biology is just 
in change. 

- What are the main difficulties of biology today? 

The biological systems, such as cells, are very complex in terms of both structure and 
dynamics. In other words, there are many variables that are interacting with many other 
variables in time and space. Today's classical approach in science is limited and it seems to 
me as the main difficulty in biology. Moreover, there are tremendous data which are waiting 
to be analyzed, however, there seems to be not a unification for collecting and interpreting 
these data. That has been at least very confusing and challenging for me while surveying. 

- What is the future of biology? 

There is of course not a direct answer to this question but it is seen from the avalanche 
of scientists on biology that this century will see many developments in the knowledge of 
living things. 

Murat Tugrul 
Sariyer, October 2006 
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Chapter 1 
INTRODUCTION 

The development and use of technology have accumulated a vast amount of experimental 
information for biology, such as DNA sequences of different species. However, our knowledge 
on how a cell works remains largely unexplored [3j d] . 

An important component of functional organization in the cell is the regulation of 
gene expression. Many interacting gene pairs of some organisms were identified with a 
high accuracy [5], especially for the yeast, Saccharomyces Cerevisiae [6j. The networks, 
or with a mathematical terminology: graphs [7], serve as a simple but powerful mathe- 
matical representation of the regulation of the gene expression, i.e. gene regulation net- 
works (GRN). Many topological tools have been developed for the investigation of the 
networks [8] [10], [TTJ [T2J 0] and they have been already applied to the yeast and other 
GRNs extensively [HE! [5]. 

It is known that the genes of eukaryotes are not always active [15] and their activation 
profiles show a very complex dynamical aspect beacuse of the strong and inhomogeneous 
interactions. As a consequence, the studying the dynamics of a gene regulation is not 
easy with the classical dynamical investigation tools like differential equations. In gene 
regulation literature, the deterministically and synchronously updated boolean networks 
have been used widely for the dynamical investigations [161 I17j . The boolean model is 
based on a 1/0 binary representation of the individuals at discrete time steps. Though such 
modelings are approximate [18], it has been shown that some applications predicted the 
wild and mutant phenotypes correctly |19 ^ 1201 [2Tj . 

Another challenge for the dynamics of gene regulation is that the rules (functions) that 
govern the interactions are not known in detail, therefore; the statistical approaches with 
random functions gain importance. Some experimental studies established some canalazing 
behaviors in the regulation functions [22J. Later, it was argued that a subset of canalazing 
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functions which was named as the nested canalazing function exists in yeast gene regula- 
tion |23j . A very recent study which depended on a logical formalism (AND and OR) claimed 
that two subclasses of the nested canalazing functions are dominant in the yeast }24j . 

Since the size of state space 2 N , where N is the number of individuals, is finite quantity 
with a boolean approach, a deterministically and synchronously updated dynamics fall into 
the state cycles which are called attractors. The attractors are used for investigation of 
the dynamics and it is argued that attractors in GRN correspond to some cycles in the 
cells such as phenotype \19\ I20| . In particular, the number and length of attractors, the 
average transient length to the attractors and the basin of attractions are explored. Another 
notion for the dynamics is the robustness of the system against the perturbations. There 
is a hypothesis related to robustness of the living systems; "Life at the edge of the chaos" 
which states that biological systems are to be robust against the perturbations but at the 
same time must be able to adapt the environments in order to be successful in evolution 

na da us]. 

In order to simulate gene regulation, artificially created network models have been used 
in literature, to my knowledge, starting from Kauffman [16] . Many dynamical studies using 
the boolean approach have been performed with the model networks. A highly used model 
was called "Kauffman Networks" which has N nodes and exactly 2 incoming edges (in this 
thesis, this model is the in-NK network with K = 2). For these model networks, it was 
believed that the number and length of attractors scale with y/N with random functions, 
but Socolar & Kauffman recently argued that the number of attractors scales faster than 
linear [27] . Apart from the attractors, the robustness of in-NK model with random functions 
which can be explained analytically by Derrida & Pomeau [28] was extensively studied in 
literature [T7] . 

In this thesis, the theoretical background is discussed first and applied to the artificially 
constructed model networks; of in-NK type, which has N nodes and exactly K indegree edges 
for each node; of in-PL type, which has a power-law indegree distribution and argued to be 
found in the natural systems often [T7]; of in- EXP type, which has an exponential indegree 
distribution as in yeast GRN [13] . Upon investigation with conventional topological tools, 
the dynamical investigations take part for those model networks. In order to be able to 
compare the results with the literature, the investigations are performed with the structural 
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parameter of (fcj n ) = 2.0 and with the boolean functions parameter p = 0.5, which is the 
probability for assigning a gene to be expressed [29]. The distribution of attractor features 
show a power-law decay [291 [30] , Also, it is observed that simple random functions produced 
larger average values of attractor features for these model networks than other canalazing 
types do. Apart from the attractor structures, the robustness was investigated. It is shown 
that Derrida's robustness expression for random functions |28|. I17j predicts the robustness 
values of the model networks within a finite-size effect. 

The yeast GRN is dynamically investigated with exploring the attractor and robustness 
structures and compared to the model networks. The average and distribution of the yeast 
attractor features were compared for different types of functions. It has been seen that the 
special subclasses of nested canalazing functions produce high number of attractors and 
short length of attractors and transients in the dynamics realizations. Also, it is observed 
that the distributions of the number of attractors and entropy are not typical and show 
a different profile unobserved before. Li et al. stated that the yeast GRN is robustly 
designed [31J. The robustness of the yeast GRN with various functions and p- values is 
computed. Morover, the model networks whose degree distributions are similar to the yeast 
are compared with the yeast dynamics and it is shown that those networks fail to mimic the 
attractor features while predicting the robustness structure correctly. Furthermore, a recent 
model, which is called in the thesis as Balcan et al. model, is discussed and some of the the 
results were reproduced |32j . It is shown that while Balcan et al. model is very successful 
for producing networks that are topologically similar to the yeast, it fails to mimic both 
attractor and robustness structures of the yeast GRN. 

The thesis also included another study which investigates a relation between folding 
kinetics with a new network representation, namely, the incompatibility network of a pro- 
tein's native structure. Starting from the description of the protein and the protein fold- 
ing problem, I discuss a novel approach to the problem proposed by my thesis advisor 
Assist. Prof. Alkan Kabakgioglu and tried to investigate the relation between proteins's 
structure and folding kinetics. I showed that the conventional topological aspects of these 
networks are not statistically correlated with the (^-values, for the limited data that is 
available. 
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Outline: There are 5 chapters in this thesis. Chapter [2] gives the theoretical and com- 
putational backgrounds and some applications with model networks. Chapter [3] consists of 
topological and dynamical investigation of yeast gene regulation network with a comparison 
to exponential and Balcanei al. model networks. Chapter 0] is for the investigation of a new 
approach to the protein folding problem. The final part of the thesis includes conclusions 
and appendices that contain extra informations. 



Chapter 2: Network Modeling 



5 



Chapter 2 
NETWORK MODELING 

This chapter is organized as follows: Section \2. II is an brief explanation of Graph Theory 
which is the mathematical roots of networks. Section 12.21 is an overview of conventional 
tools for topological analyse of networks and gives some applications with model networks. 
Section [2T31 constructs the necessary modeling and tools for dynamics, and applies to model 
networks. 

2.1 Graph Theory 

Considering the dynamics and structure of complex systems, we need better mathematical 
tools than ordinary ones, such as differential equations, to deal with these systems [8]. For 
such a purpose, networks are used in scientific representation of complex systems. 




Figure 2.1: The problem is whether it is possible for a citizen of Konigsberg (today's 
Kaliningrad) to start from somewhere in the city and cross the bridges exactly once and 
return to initial place. 

A more mathematical term for "Network" is "Graph" and all tools of network modeling 
can be originated from Graph Theory which is a popular subdiscipline of mathematics at 
present. It is believed that the starting point of Graph Theory goes back to 1736: Euler's 
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negative proof to the famous Konigsberg Bridge Problem (Figure fe.llfl ). Many developments 
in graph theory have been achieved in the previous century with an increasing interest from 
other sciences [7j. 

Here, I emphasize only the fundamental concepts related to my thesis. The definitions 
of Graph Theory are given as follows (as adapted from West [7] ) : 



Graph: A graph (or network) G is a triple consisting of a node (or vertex) 
set V(G), an edge set E(G), and a relation that associates with each edge two 
vertices (not necessarily distinct) called its endpoints. 




Figure 2.2: A simple graph (network); numbered objects in the figure are vertices (nodes) 
and the arrows are directed edge. 



Loop, Adjacent and Neighbor: A loop is an edge whose endpoints are 
equal. When the nodes u and v are the endpoints of an edge, they are adjacent 
and neighbors, and will be shown as u <^> v 

Path and Cycle: A path is a simple graph whose nodes can be ordered 
so that two nodes are adjacent if and only if they are consecutive in the list. 
A cycle is a graph with an equal number of nodes and edges whose nodes can 
be placed around a circle so that two nodes are adjacent if and only if if they 
appear consecutively along the circle. 

Subgraph, Connectedness, Cluster: A subgraph of a graph G is a 
graph H such that V(H) C V(G) and E(H) C E(G) and the assignment of 
endpoints to edges in H is the same as in G. We then write H C G and say 
that "G contains H". A graph is connected if each pair of nodes in G belongs 



The figure is taken from http://www.britannica.com/eb/article-9384377/Konigsberg-bridge-problem. 
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to a path; otherwise, G is disconnected. A cluster is a connected subgraph in 
a graph and has no edge to nodes which are not in this subgraph. 

Incident, Degree, Isolated node: If node v is an endpoint of edge e, then 
v and e are incident. The degree of node v, d(v), is the number of incident 
edges. An isolated node is a node of degree 0. 

Directed Graph (or Digraph): A directed graph or digraph G is a 
triple consisting of a node set V(G), an edge set E(G), and a function assigning 
each edge to an ordered pair of nodes. We say that there is an edge from vi to 
Vj and show it as Vi —* vj. 

Component, Adjacency Matrix: The components of a graph G are its 
maximal connected subgraphs. The adjacency matrix of G, written A(G), is 
the n-by-n matrix in which entry Aij is the number of edges in G with v i —* v j . 

Outdegree, Indegree: Indegree of a node v, d(v)i n , is the number of edges 
into v. Outdegree of v, d(v) ou t, is the number of edges from v. 

2.2 Network Topology 

Networks are mathematical objects that represent the real complex systems. In order to 
classify the network structures, here I tried to examine the topological properties and quan- 
tifiers at first and later gave some examples with model networks. 

2.2.1 Topological Properties and Quantifiers 

Other than the number of nodes N and of edges N(e), one needs some features for both 
comparing and classifying the topologies of networks. Here, I listed some which were used 
throughout this thesis. For more detailed readings one can see the review articles [U |8]. 

a- Degree Distribution, P(k) 

The degree probability distribution is the probability distribution function for finding a 
node of network with degree k. 

If the network is non-directed then there is only one degree notion, however, if the 
network is directed then one can define three different degree distributions. Total-Degree 
Distribution, P{ktot)'- In this case the network is pretended to be non-directed, in other 
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words, directions of the edges are removed, and the degree distribution is explored. Out- 
Degree Distribution, P(k ou t): In this case, one counts only the edges outgoing from node 
and calculate their distribution. In-Degree Distribution, P(ki n ): In this case, one counts 
only the edges incoming to node and calculate their distribution. 

b- Degree-degree correlation, k nn (k) 

Degree-degree correlation function gives us the average degree of a node which our fc-degree 
node connects [32], [13] . 



where p(k\k') is the conditional probability that the node with degree k is connected to a 
node with degree k'. 

c- Clustering Coefficient, C{ 

The clustering coefficient of a node is the fraction of the existing triangles including the 
node in quest to maximum possible number of triangles including this node [8]. Using the 
notation Aj for the number of triangles including v j and knowing the fact that the maximum 
number of triangles is fcl ^ — , one can state the clustering coefficient of v% Cj as follows: 



One can define another quantity C(k) which give us the average Cj of the nodes whose 
degree is k [32] . 





(2.1) 



k' 




C(k) — (Ci) d („.) =fc 



(2.3) 



d- Rich-Club Coefficient, r(k) 

One can define N > / : as the number of nodes whose degree is greater than k and iV(e>fc) as 
the number of edges between those nodes. Then, rich-club coefficient r(k) |llj : 



Chapter 2: Network Modeling 



9 



e- K-core 

K-core was proposed by Bollobas |9j as a quantity that reflects hierarchical structuring in 
a network. Starting from k = 0, recursively each node whose degree is less than or equal to 
k is labeled as the member of this fc-core and then pruned with its edges from the network. 
This procedure is repeated until no node whose degree is less than k remains ^ 

/- Motifs 

"Motifs" have been recently proposed by Milo et al. [10] to capture simple subnetwork 
structures in directed networks. Some motifs may appear more frequently in network at 
handle 

g- Shortest Path, sp 

Another important feature in the topology of networks is geodesic or shortest path from 
one node to another. There might be not a single path from Vi to Vj in the network and 
in this case the smallest length path is called shortest path, spij (8J [12] . One can define a 
probability function, P(l), for finding a shortest path of length I. Similarly, one can also 
define spi, average shortest path of Vi to other nodes in the network. 

It should be noted that if two nodes are at different clusters, then their shortest path are 
either taken as infinity, or ignored in the calculations as in this thesis. Networks with small 
shortest paths are in special attention in literature and are named as Small World [33J 
networks [8]. 

h- Betweenness, hi 

Betweenness 6, for Vi is defined as the total number of shortest paths passing through Vi [12] . 
Therefore, one can define a probability function P(b) for finding a node with betweenness b. 
For finding 6j I have used Newman's algorithm given in Reference |12j . 

2 For this topological feature one can use the online tool: http:/ /xavier. informatics. indiana.edu/lanet-vi/, 
October 31st 2007 

3 For this topological tool one can use the free-software mfinder from Uri Alon's webpage 
http : // www. weizmann. ac .il / mcb/Uri Alon / 
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2.2.2 Topological Investigations on Some Model Networks 



First of all, let me define what kind of model networks were used in this chapter. These 
network types were named according to their indegree distributions and this was stressed 
by inserting an in- as a prefix to the model name, such as in-XXX network. Although I did 
not use in this thesis, one should be aware of other sorts of model topologies, like random 
or gaussian. 

in-NK Network: In this model, every node has exactly K incoming edges. In-NK 
model has been used and investigated widely in literature. 

in-Power-Law (in-PL) Network: A network whose degree distribution is given by 
power-law, i.e. P(k) ~ k~ a . It has been observed that many complex systems in nature 
have a PL behavior in their degree distribution^ Those PL networks found in nature come 
with taking the exponents a as shown in Figure [231 [17]. 



lit!) ■ 



M. Aidannf Physics D ISS (2Q03) 4S-S8 



1.5 2 2.5 



Figure 2.3: Histogram shows the distribution of Power-Law exponent of 30 networks found 
in nature. Taken from Reference [17]. 7 in figure equals to a in my notation. 



in-Exponential (in-EXP) Network: A network whose degree distribution is ex- 
ponential, i.e P{k) ~ e~ Afc . Although to my knowledge many real-life networks are PL 
networks, some show an exponential behavior, in particular the gene regulation network of 
yeast shown in Figure [3T3l 



4 In literature the term scale-free network is also used for power-law network 
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a- in-NK, in-PL and in-EXP Network Topologies 

In order to be able to compare different topologies, some of the properties needed to be 
fixed. For the sake of comparison to the network literature, I fixed the number of nodes to 
N = 100 and the average indegree to (ki n ) — 2.00 so that in-PL exponent a = 2.25 and 
in-EXP exponent A = 0.7 were chosen. Figure [274] shows and discuss the correspondence 
of exponents to average indegree of in-EXP and in-PL networks. Next, ensembles of 100 
model networks of type in-NK, in-PL and in-EXP were created and investigated. The 
topological distributions of these ensembles are in Figure l2~75| Figure [2761 for in-NK networks, 
in Figure [2771 Figure [2~8l for in-PL networks, in Figure [2791 Figure [2 .101 for in-EXP networks. 

I found out that in general, the topological features of in-EXP networks are between 
in-NK and in-PL networks'. 
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Figure 2.4: a-) shows the correspondence of PL exponent to (ki n ) both for the computational 
and analytical for N = 100,200,300 (from bottom to top). It is seen that deviation is at 
an important level for comp. and anal, cases. Computational results are used in this 
thesis, b-) shows the correspondence of EXP exponent to (ki n ) both for the computational 
and analytical for N = 100,200,300 (from bottom to top). It is seen that deviation has 
little effect comparing to PL case. Computational results are used in this thesis, o) 
Indegree prob. distribution of computational and analytical cases for PL exponent a = 2.25 
N = 100; this study shows that the difference of computational and analytical cases comes 
from the contribution of k{ n = 1 in the sum. d-) Indegree distribution of computational 
and continuous analytical cases for EXP exponent a = 0.7 N = 100; comparing the PL 
case, the contribution of ki n = 1 to sum is small so it does not deviate from analytical case. 
See Appendix El for analytical (ki n ) expressions. 
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Figure 2.5: 100 in-NK (N = 100, K = 2) networks were created. Their topological features 
were investigated and averaged. Figure shows the corresponding; a-) indegree probability 
distribution, b-) outdegree probability distribution, c-) totaldegree probability distribution 
and d-) degree-degree correlation. 
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Figure 2.6: 100 in-NK (N = 100, K = 2) networks were created. Their topological features 
were investigated and averaged. Figure shows the corresponding; a-) rich-club coefficient, 
b-) clustering coefficient distribution, c-) average shortest path prob. distribution and d-) 
betweenness prob. distribution. 
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Figure 2.7: 100 in-PL (N = 100, a = 2.25, (fcjn) = 2.0) networks were created. Their 
topological features were investigated and averaged. Figure shows the corresponding; a- 
) indegree probability distribution, b-) outdegree probability distribution, c-) totaldegree 
probability distribution and d-) degree-degree correlation. It should be noted that x~ 2 ' 25 
function was drawn in order to help the reader and is not a fitting. 
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Figure 2.8: 100 in-PL (TV = 100, a = 2.25, {hn) = 2.0) networks were created. Their 
topological features were investigated and averaged. Figure shows the corresponding; b- 
) clustering coefficient distribution, c-) average shortest path prob. distribution and d-) 
betweenness prob. distribution. 
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Figure 2.9: 100 in-EXP (N = 100, A = 0.7, (k in ) = 2.0) networks were created. Their 
topological features were investigated and averaged. Figure shows the corresponding; a- 
) indegree probability distribution, b-) outdegree probability distribution, c-) totaldegree 
probability distribution and d-) degree-degree correlation. It should be noted that x -0 ' 7 
function was drawn in order to help the reader and is not a fitting. 
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Figure 2.10: 100 in-EXP (N = 100, A = 0.7, (k in ) = 2.0) networks were created. Their 
topological features were investigated and averaged. Figure shows the corresponding; a-) 
rich-club coefficient, b-) clustering coefficient distribution, c-) average shortest path prob. 
distribution and d-) betweenness prob. distribution 
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2.3 Network Dynamics 

For the investigation of dynamics in networks the Boolean model with synchronously and 
deterministic update was used. In this model, each node Uj has a node state o~i(t) at a 
particular time t where o~i(t) is either 1 (on) or (off). The network state S(t) is the 
set of individual node states: S(t) = {a\(t), cr 2 (i), ••> °"iv(i)}- + 1) is determined by the 
Boolean Function Bi of the node V{. Bi depends on only to indegree neighbors of the 
node Vi. It should be noted that in the case of zero indegree, o~i is fixed to either to 1 or 
for every t. 



ffi(t + l) = < 



(2.5) 



Bi(o- itl (t),o-i, 2 (t), ..,a iikin (t)), k in > 
or 1 (fixed), ki n = 

where <7j,- is the node state of jth in-neighbor of Vi and ki n is the indegree of V{. 
During the dynamics, the node states {o~i(t)} are collected at time step t and inserted 
into each Boolean Function synchronously. The output of Bi is assigned to the <7j(i + 1) for 
time t+1. 
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Table 2.1: An example of the ruletable expression for Bi where Vi has ki n indegree nodes. 



Let me explain these processes by constructing a table which includes all possible com- 
bination of incoming nodes' states and assigns an output to them for a particular node 
state. Such a table is called Ruletable as shown in Table 12.11 If there is ki n incoming 
nodes, then there are 2 kin input combinations, i.e 2 kin rows in the ruletable. The output of 
each combination is either 1 or 0, which makes 2 different ways to construct the output 
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columio Before the dynamics starts, a ruletable out of 2 2 in possibilities is chosen for each 
node, so that dynamics runs deterministically. I used the term network realization for 
one network topology with its all assigned ruletables (functions) in this thesis. 

2.3.1 Some Boolean Function Types 

Although in many systems we may know the interacting pairs of individuals very well, we 
have usually little information about which rules govern the dynamics, as in gene regulation 

ok ■ 

networks [23J. In other words, we do not know which combination out of 2 in to choose 
in the ruletable for each t>j. However, we are able to shape the structure of the Boolean 
Functions. I use 4 types of these random function structures found in literature. 

a- Simple Random Function, RF 

A function type whose each input combination (each row in the ruletable) is assigned to an 
output value of 1 with a probability p. 

b- Canalyzing Random Function, CF 

A Canalyzing Random Function has at least one canalyzing input variable, such that for at 
least one certain canalyzing value of that variable, the output value is fixed [23} 125]. 



where jth in-neighbor is the canalyzing node with Sj as the canalyzing value and Sj as the 
canalazing output. As in RF only one parameter p is used whenever an output value is 
needed to be determined. It is should be mentioned that B^a^i, ..,sj, .., o"i,fc in ) hi Exps. [2761 
is considered to be RF. 

c- Nested Canalyzing Random Function, NCF 

Having investigated the Harris et aVs work on gene regulation [22], Kauffman et al. have 
proposed a new function type known as Nested Canalyzing or Hierarchically Canalyzing 

5 For a better understanding of the magnitude, = 4, 16, 256, 65536, 4294967296 for K = 1,2, 3, 4, 5, 




(2.6) 



respectively. 
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function which is argued to be found in the biological systems |23j . In this type, there is a 
canalazing order in input nodes and the output is determined by first node at its canalyzing 
value [23J is given: 



51.1 = S\ 

51.2 ^ si A a i>2 



S2 



/ si A Oi.2 / s 2 A ... A (Xi j = sj 



(2.7) 



Si,k in / si A cr ii2 / s 2 A ... A (J itkin = s kin 

K s^k~ 0-j.i ^ si A on / s 2 A ... A <r i)i!iB / s k . n 

where P( Si)i =TRUE)=P ( Si =TRUE)-^=^- 
to the canalyzing order. 



' l+exp(-2i a) 



j=l,2..,/cj n and j is numbered with respect 



In this thesis, the definition of NCF is modified for the sake of consistency. Firstly, a p 
parameter as in above functions were adapted. Secondly, the last statement in Exps. [2771 in 
determining the output was altered. Instead of using s7^~, the output value was determined 
again by using p. 

d- Special Subclasses of Nested Canalyzing Random Function, SNCF 

After the proposition of Nested Canalyzing Functions (NCF) by Kauffman et al. [23], Nikole- 
jewa et al. presented "a new minimal logical expression" for all NCFs [24] as follows, 



(2- 



<7j — Bi(a i+ l, CT i+ 2, Vi+kin-l, 0~i+k in ) 

where Q represents either AND or OR logical function, i.e. Q £ {A,V} and a e is for 
possible negation of a, i.e. c® G {a, a} 

They classified the NCFs according to the possible chances for Q. Upon investigation 

they have found that gene regulatory rules are mainly governed 



of Harris et al. data 1 22 



by two subclasses of NCF 

A (<t?+2 A (••• A {af +k ._i A af +k J...)) 



(2.9) 



Nikolejewa et al. have noted in their paper that they have taken the data from Harris by private 
communication 
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and 

A K Q +2 A (... A (a&fcn-l V ^J-)) (2-10) 
with 66.39% and 29.41% probability of occurrence, respectively. 

In this type of function p is not a free parameter and depends on the topology. It 
is easy to calculate p analytically for in-NK model topologies: p~ (2/3) x (1/2 ) + 
(1/3) x (3/2 x ) = 1.66 x 2~ K [24]. For instance, for K = 1,2,3,4,5,6, one finds p = 
0.83,0.41,0.21,0.10,0.05,0.03,0.01, respectively. Figure shows the validity of this for- 
mula for in-NK n alogies. 

0.9 
0.8 
0.7 
0.6 
0.5 

a, 

0.4 
0.3 
0.2 
0.1 


1 2 3 4 5 6 

<k_ m > 

Figure 2.11: p value investigation of Special Subclasses of Nested Canalyzing Random 
Function for model networks. The analytical expression for in-NK networks is 1.66 x 2~ kin 
and it is seen that it predicts correctly for ki n > 1. For other types, p- values are obtained 
from this figure in this thesis. 



NK model 
PL Model 
EXP Model 
(30 Analytical prediction 




2.3.2 Dynamical Properties and Quantifiers 

In order to compare the dynamics of various networks, one needs quantitative measures. 
Here I provide quantities related to two notions: Attractor and Robustness. 

a- Attractor 

Remembering that each node state can be either 1 or 0, the size of state space is 2 N . 
Once the network realization (network topology and ruletables) are fixed, the dynamics is 
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deterministic. In other words, if one chooses a network state Si at time t in 2^ states, s/he 
arrives at exactly one network state at next time step t+1. Also, since 2^ is a finite number, 
at most after traversing all the states, the dynamics starts to fall in a cycle (Figure [2,121) . 
Such a cycle is called attractor and it is an important feature of the boolean dynamics. 




Figure 2.12: A simple network of 3 nodes is shown on the left. Its realization with ruletables 
shown in Table I2T21 produces two attractor as shown on the right. 

It is believed that such attractors correspond to some process cycles in the systems, such 
as phenotype in the cells |25t ll9|. Mendoza et al. showed the correspondence between some 
attractors and known phenotypes in Arabidopsis thaliana by using a similar approach. They 
also predicted some mutant phenotypes and confirmed them by experiments |19j . Some 
other studies also reported similar conclusions [201 [21] 

There are some quantifiers for the attractors in boolean systems. The first one is the 
number of attractors N a ttr hi the network realization. Second one is the number of 
network states of an attractor possesses, length of the attractor L a u r - Third one is 
the average number of network states to arrive at an attractor, transient to attractor 
Tattr- The last quantifier is related to the notion of the basin of attraction. The set of 
network states which go to a particular attractor is called the basin of attraction of that 
attractor. The size of the basin of attraction normalized by 2^ is w a tt r - Recently, Kravitz 
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Table 2.2: The ruletables of the nodes in the network shown in Figure [2.121 



and Schumulevich [33] have proposed an entropy h definition for boolean dynamics: 

h = — Wjlnwi (2-11) 

i 

and h was used here to compare the basin of attractions of the network realizations. 

Attractors were found in this thesis by using a heuristic algorithm (See Appendix iBl for 
more details about attractor finding algorithms.). 

b- Robustness 

For a system to be sustainable, its dynamics should not be effected drastically in every 
intensive or extensive changes, such as errors in individuals or environments. On the other 
hand, the dynamical systems like gene regulation should be open to some changes in order 
to survive through evolution. These arguments brings a hypothesis called "Life at the edge 
of chaos" that states the life systems should be at some where between chaotic and ordered 
phases pH [251 HI] • 

So, how robust system is a valuable to detect for the dynamics and urges us to quantify 
robustness. It is presented here as follows [17] ■ Consider two network states S(t) and S (t). 
Their Hamming Distance HD(t) is the number of nodes that are different in their states 
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at time t [261 [T7]: 



HD(t) = ^2\a l (t)-a' i (t) 



(2.12) 



8=1 



Let me define two other quantities which are the overlapping functions at time step 
t and t+1, respectively: 



x(t) = 1 



HD(t) 
N 



(2.13) 
(2.14) 



M(x{t)) =x(t + l) 

The robustness under small perturbations is measured at the attractor of the system. 
In other words, we have 

x = lim^oo x{t) (2.15) 

while x — > 1~. 




Figure 2.13: Robustness criteria [T7 



Referring the Figure 12.131 we know M{x) > when x = and assume a monotic 
increase in M(x) function. If lim^.^- dM d ^ is gerater than 1 then M(x) function crosses 
M(x) = x line at some x* < 1, which is a stable fixed point other than x = 1. In this 
case, the system is forbidden to arrive at x = 1 unless x = 1. If lim x ^ 1 - is less than 

1 then there is no stable fixed point other than x = 1 and two network states S and S' 
converge soon or later. These two cases are named as chaotic, ordered respectively and 
the case of lim a .^ 1 - dM d ^ equals 1 is named as critical transition border/boundary in 
the corresponding literature [T7] |25" 1 [26] . 
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In sum, with showing the robustness quantity with s 



s = lim 



dM(x) 



(2.16) 



three important phase are summarized as follows 



s < 1 



Ordered 



s 



1 Critical Boundary 



(2.17) 



s > 1 



Chaotic. 



and it is concluded that if s < 1, the system is robust against perturbations while if s > 1, 
the system is very sensitive to them |26j. 

2.3.3 Dynamically Relevant Subnetwork 

Some of the nodes are irrelevant to the attractor results due to topology or functions of the 
network realization. These nodes only serve as computational challenges and some of the 
nodes with their edges can be recursively removed from the network without any general 
change in the dynamics |27j . Such nodes were labeled as irrelevant and the rest of the 
network/nodes were called the dynamically relevant subnetwork/nodes in this thesis. 

The attractor computations in this thesis were done with a minimal dynamically relevant 
subnetwork which is found by use of a procedure which considers only the topology of the 
network to obtain it . The procedure depends on the fact that a node with zero indegree 
stays at a fix state all time steps. Also, a node with zero outdegree does not affect any node 
in the system although its state may fluctuate. Removing these two types of nodes with 
their edges recursively results in the dynamically relevant subnetworks used in this thesis. 

Figure 12.151 presents a computational study which investigate the percentage size of the 
dynamically relevant nodes in the in-NK, in-PL and in-EXP networks for different (fcj n ) and 
N. Main conclusion of this study is that the percentage of relevant nodes depends only on 
the (ki n }, not on the network topology. 

2.3.4 Dynamical Investigations on Some Model Networks 

In order to investigate the dynamics of different model networks and the effect of their 
topologies on the dynamics, in-NK, in-PL and in-EXP network ensembles with (ki n ) = 2.0 
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were studied (Figure \2. 41 shows that in-PL exponent a = 2.25 and in-EXP exponent A = 0.7 
give (ki n ) — 2.00). The reason of choosing (ki n ) — 2.0 was its extensively use in related 
literature. 

a- Attractors of in-NK, in-PL and in-EXP Model Networks 

For distribution of the attractor features, N and p are fixed to 100 and 0.5, respectively. 
The reason behind using this p value was to compare the results to literature. However, 
since p was fixed, the SNCF whose p is not a free parameter was not used in this part. For 
each model type, I used 200 networks with 10 realization for each network in computations. 
Attractors were obtained after sampling 1000 initial conditions with the limits of maximum 
step and maximum length of an attractor as 1000 and 200, respectively. 

The distribution of the number of attractors N a u r , the length of attractor L a u r , transient 
Tattr and the entropy h a ttr are shown in Figure \2AQ\ Figure \27Tf\ Figure ETH] and Figure l27T9| 
respectively. Apart from the distributions, averages of these features are given in Table 




Figure 2.14: The subnetwork yielded after pruning recursively the nodes with either zero 
outdegree or zero indegree is named dynamically relevant subnetwork and this subnetwork 
was used in this thesis during the dynamics runs. 
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Figure 2.15: The fraction of dynamically relevant nodes to network size A at a-) in-NK, 
b-) in-PL and in-EXP networks. For in-NK, the values were found by averaging over 
1000, 800, 500, 300, 200 networks of K = 1, 2, 3, 4, 5, respectively. For in-PL and in-EXP, the 
values were found over 100, 500 networks, respectively. 



Table E31 Table [23] and Table EjJ respectively. 

I found out that the probability distribution functions for N at t r , L at t r , Tattr and h at t T in a 
network realization with in-NK, in-PL and in-EXP networks and RF, CF and NCF decay as 
a power-law function as stated for some topology and function types in References |29] 130] . 
Also, is was noted that both N a ttr and L a ttr shows a strange odd-even oscillations in the 
distributions which was also stated in Reference |30j. After some discussions, it was con- 
sidered that these odd-evenness due to from artificial effects, for instance, the combinations 
of 2-, 3-, etc. node partial network states tends to create evenness. Furthermore, it should 
be noted that RF gives out considerably greater average values of those features than CF's 
and NCF's. While the average values were closer to each other with CF and NCF for all 
types of topologies, the averages with RF are higher than other for in-NK topology. 

I also checked the scaling of the average values of the quantities above with number of 
nodes A for random, canalazing and nested canalazing functions with p = 0.5. N were 
chosen as 50, 55, 60, 66, 74, 82, 92, 100, 113, 124, 136, 149, 163, 179, 200, 215, 236, 259, 
284, 300, 343, 377, 414, 455, 500, 550, 605, 665, 731, 804, 884, 972, 1000 in order to have 
a more accurate scaling behavior at small A^s but also to check big As. For A < 100, 200 
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(Naur) 


in-NK 


in-PL 


in-EXP 


RF 


12.00 =F 24.06 


5.28 T 9.49 


9.20 =F 21.95 


CF 


3.97 =F 7.92 


3.11 T 5.06 


3.81 T 6.49 


NCF 


4.59 T 11.18 


3.54 T 17.17 


2.86 =F 4.69 



Table 2.3: The average number of attractors {N a u r ) of model networks for random RF, 
canalyzing CF, nested canalyzing NCF functions. For 200 networks with N = 100 and 
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Figure 2.16: The number of attractors prob. distribution for model networks. RF: 
Random Function, CF: Canalazing Function, NCF: Nested Canalazing Function. For 200 
networks with iV = 100 and 10 realizations for each network, the attractors were found 
by initiating from 1000 initials conditions with limits of 1000 maximum step size and 200 
attractor length. In order to uncover the artificial odd-even effect as shown in small frame, 
the data were binned in 2 units. 
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(L a ttr) 


in-NK 


in-PL 


in-EXP 


RF 


11.69 =F 22.36 


12.13 =F 71.60 


20.15 T 101.82 


CF 


3.05 =F 4.39 


3.39 T 23.48 


3.77 T 26.58 


NCF 


2.98 =F 3.83 


2.02 T 2.60 


2.12 =F 1.92 



Table 2.4: The average length of attractors of model networks. RF: Random Function, CF: 
Canalazing Function, NCF: Nested Canalazing Function. For 200 networks with N = 100 
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Figure 2.17: The length of attractors {L at t r ) prob. distribution of model networks. RF: 
Random Function, CF: Canalyzing Function, NCF: Nested Canalyzing Function. For 200 
networks with iV = 100 and 10 realizations for each network, the attractors were found 
by initiating from 1000 initials conditions with limits of 1000 maximum step size and 200 
attractor length. In order to uncover the artificial odd-even effect as shown in small frame, 
the data were binned in 2 units. 
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(Tattr) 


in-NK 


in-PL 


in-EXP 


RF 


58.44 qp 164.61 


23.52 =F 74.84 


35.87 T 104.79 


CF 


10.55 =F 5.33 


9.63^23.86 


10.64 =F 26.70 


NCF 


10.64 =F 4.68 


7.29 T 3.02 


8.35 =F 2.91 



Table 2.5: Average transient to attractors of model networks. RF: Random Function, CF: 
Canalyzing Function, NCF: Nested Canalyzing Function. For 200 networks with N = 100 
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Figure 2.18: Transient to Attractors distribution of model networks. RF: Random 
Function, CF: Canalyzing Function, NCF: Nested Canalyzing Function. For 200 networks 
with iV = 100 and 10 realizations for each network, the attractors were found by initiating 
from 1000 initials conditions with limits of 1000 maximum step size and 200 attractor length. 
The data were binned in 10 units in order to have clearer distribution. 
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(hattr) 


in-NK 


in-PL 


in-EXP 


RF 


1.30 =F 110 


0.88 T 0.86 


1.17 =F 1.00 


CF 


0.65 =F 0.78 


0.58 T 0.71 


0.68 T 0.75 


NCF 


0.71 T 0.81 


0.43 T 0.72 


0.48 =F 0.68 



Table 2.6: Average values of the entropy of model networks. RF: Random Function, CF: 
Canalyzing Function, NCF: Nested Canalyzing Function. For 200 networks with N = 100 
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Figure 2.19: Entropy h a a r distribution of model networks. RF: Random Function, CF: 
Canalazing Function, NCF: Nested Canalazing Function. For 200 networks with N = 100 
and 10 realizations for each network, the attractors were found by initiating from 1000 
initials conditions with limits of 1000 maximum step size and 200 attractor length. The 
data were binned in 0.2 units in order to have clearer distribution. 
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networks were used for all function types. For N > 100, 100 networks for CF and NCF, 
and 50 networks for RF were used (RF runs slowly than others). The other parameters 
for dynamics were the same with N = 100 case above. The results for {N a u r ), {L a u r ), 
{i~attr) an d (hattr) scalings with N can be seen in Figure [2.201 Figure [2,211 Figure [2.221 and 
Figure [2.231 respectively. 

I found out that for in-NK network with RF; (N a ttr) scales with a fitting jV ' 53 , (L a ttr) 
scales with a fitting iV 0,87 , (r a tt r ) scales with a fitting iV L04 . For a long time (N a u r ) and 
{L a ttr} scalings were considered as y/~N [23] until Socolar & Kauffman published Refer- 
ence [27J which states that {N a u r ) scales with faster than linear. With this study I have 
shown that scaling is valid for N at t r while fails for L attr . Also, it should be noted that 
for CF and NCF, scalings are very small comparing to RF which might be considered as 
desirable for the biological systems. 
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Figure 2.20: The scaling with N of the average number of attractors for RF,CF,NCF with 
p = 0.1 
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Figure 2.21: The scaling with N of the average length of attractors for RF,CF,NCF with 
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Figure 2.22: The scaling with iV of transients to attractors for RF,CF,NCF with v = 0.5. 
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Figure 2.23: The scaling with N of the average entropy for RF,CF,NCF with p = 0.5. 
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b- Robustness of in-NK, in-PL and in-EXP Topologies 

Numerical studies have yielded that in-NK model networks of K = 2 with random functions 
have privileged dynamical aspects than others, i.e. small, less number of attractors and at 
the boundary between chaotic and order phase |16j . Derrida & Pomeau were the first to 
give an analytical argument for why there is a such critical K value [28] . Later, Aldana 
generalized the argument for other type of topologies and gave more pedagogic expression 

H7J. 

Consider x(t) which was discussed in robustness expression. One can define two sets 
A(t) and B{t) such that A(t) is the set of nodes whose incoming edges come only from the 
nodes whose states are the same in the S(t) and S (t) and B(t) is vice versa. 

G = {vi,v 2 , .-v N } 
A(t) = { Vi , Vj : ( Vj - Vi ) A(^(t) = o-'jit))) (2.18) 
B(t) = G-A(t) 

Then we can express, 

oo 

M(x{t))= V P(k m ){ [x(t)] k - + (l-[x(t)] k ^)x (p 2 + (l- p ) 2 ) }, (2.19) 

^ tn ~^ prob. of A(t) prob. of B(t) prob. the same output 



Contribution from A(t) Contribution from B(t) 

oo 

= Y, P(k in ){-[x(t)] k -(2p 2 - 2p) + (2p 2 - 2p+ 1)}. (2.20) 

If we take the derivative of the both sides with respect to x(t) and use the fact that 

d J&$l = (2p 2 _ 2p) £ hn[x{t)] ^-i p{km y (2>21) 
ax{Z) ki=i 
Remembering X]fc°„=i hnP(hn) = (&m) and Eq. 12.161 one finds; 

lim dM(x(t)) = Hm (fc . n)a . (t) <fc in )- 1(2p 2 _ 2p) (2 22) 
x(i)-»l- dx(t) s(t)-»l- 

= 2p(p-l)(fc in ). (2.23) 

I tried to examine the validity of Formula 12.231 for in-NK, in-PL and in-EXP net- 
works with the simple random function (RF) by comparing the analytical and computa- 
tional results. Especially, the critical chaotic-ordered boundary, s = 1, was checked. The 
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robustness computations were done for different sets of p G {0.0, 0.01, 0.02, 0.49, 0.50} 
and corresponding parameters for (ki n ): for in-NK, K £ {1, 2, 5, 6}; for in-PL, a S 
{1.60, 1.65, 1.70, 2.45, 2.5} and for in-EXP, A G {0.30, 0.35, 0.95, 1.0}. For the analyt- 
ical expression (Eq l2.23| ). the relations between {ki n } with a, A were yielded by consulting 
the Figure [2~4l The results can be seen in Figure [2,241 m Figure [2.251 and in Figure [2.261 for 
in-NK, in-PL and in-EXP networks, respectively. 

I found out that robustness values, especially the critical boundary, match with the 
analytical values (Formula 12. 23p for in-NK Model. The analytical expression also predicted 
the robustness values of in-PL and in-EXP networks but not as in-NK networks' case. I 
saw that matching for these networks gets better while N is increased which concluded as 
finite-size effect. In short, Expression 12.231 is successful for predicting the robustness of the 
networks for simple random functions. 

I also checked the variation of the robustness for all types of functions (RF, CF, NCF, 
SNCF). Again, each network topology was set to {kt n = 2.0) and N = 100. Each robustness 
value for corresponding (ki n ) parameter and p was yielded by using the following parameters: 
100 random initials conditions for each 10 network realizations for each 10 networks. As 
it can be seen in Figure 12.271 canalazing and nested canalazing functions resulted in more 
ordered robustness values than simple random and special subclasses of nested canalazing 
functions for all types of networks. 
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Figure 2.24: Analytical(— — — ) and computational^ — h +) the robustness values were 
compared for in-NK model with N = 100 for each K G {1,2,. .,6} and p € 
{0.0, 0.01, .., 0.49, 0.50} for simple random functions. For each K,p, 10 networks and for 
each network 10 realization were constructed and the robustness was calculated starting 
from 100 initials conditions of each realization. Here, s=l corresponds the critical border 
for transition from ordered to chaotic regimes. It seems that Derrida's results matches the 
s=l border. 
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Figure 2.25: As in previous figure, the computational and analytical robustness was com- 
pared for in-PL networks with N = 100, a = 1.7, 1.75, .., 2.5 and p G {0.0, 0.01, .., 0.49, 0.50} 
for simple random functions. For each a,p, 10 networks and for each network 10 realization 
were constructed and the robustness was calculated starting from 100 initials conditions of 
each realization. The computational and analytical results are close to each other, it was 
seen with bigger N values, the matching got closer which concludes a finite-size effect. 
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Figure 2.26: As in previous figure, the computational and analytical robustness 
was compared for in-EXP networks with N = 100, A = 0.3, 0.35, .., 1.0 and p € 
{0.0, 0.01, .., 0.49, 0.50} for simple random functions. For each X,p, 10 networks and for 
each network 10 realization were constructed and the robustness was calculated starting 
from 100 initials conditions of each realization. As in previous case, it was seen with bigger 
N values, the matching got closer which concludes a finite-size effect. 
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Figure 2.27: Figure shows the robustness values of the a-) in-NK, b-)in-PL and c-)in-EXP 
networks of (k in ) 9* 2.0 and N = 100, p G {0.0,0.02, ..,0.50} for RF: Simple Random, CF: 
Canalazing, NCF: Nested Canalazing and SNCF: Special Subclasses Nested Canalazing 
Functions. SNCF robustness results and corresponding p values for in-NK, in-PL and in- 
EXP are 0.99 =F 0.05, 0.93 T 0.05, 0.92 =F 0.05 and 0.42, 0.44, 0.39, respectively. 
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Chapter 3 
GENE REGULATION 

This chapter is organized as follows: Section [3, II introduces the gene regulation concept 
in biology. Section [3.21 gives the yeast gene regulation network (GRN) with the topological 
and dynamical investigations. Section 13.31 compares the yeast GRN with model networks 
whose indegree probability distribution is exponential. Section 13.41 emphasizes a recently 
proposed model which produce networks that are topologically similar to yeast GRN and 
discusses this model dynamically. 




Figure 3.1: The DNA is considered to fulfill three main function in the life systems: 1-) 
storage, 2-) heritage and 3-) expression of the genetic information in the cells [2] 



3.1 Introduction 

To my knowledge, the most recent scientific definition of "gene" (Figure r3.llm was proposed 



by Gerstein et al.: "A gene is a union of genomic sequences encoding a coherent set of 
potentially overlapping functional products." 35]. Genes in prokaryotes are always active 

1 Taken from http://en.wikipedia.org/wiki/Gene 
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and express their coded information into functional elements unless they are repressed by 
outside factors. However, at a particular time genes of eukaryotes (in Reference [15J it is 
stated as 2-15%) are generally inactive and need to be activated, therefore, one can talk 
about regulation of gene expression in eukaryotes [15J. 

There are different types of regulation of gene expression data depending of how it is 
detected. In this study, I used the transcriptional GR since the data used at this thesis 
was supplied by monitoring the levels of transcription materials, i.e. mRNA. For the details 
of experiments to yield the data, see References [6j O [36] • 



r 




Figure 3.2: A brief explanation of the gene regulation process used in the thesis [32 



The transcriptional regulation of gene expression, or in short, the gene regulation is 
modeled as follows: Each gene on the DNA possesses two main regions. The first is the 
promoter region (PR) and the second is genetic codes that are transcribed. In order 
to activate/inhibit the transcription, the necessary conditions need to be hold at the PRs. 
Although these conditions are more complex, they are simplified as existence/ absence of 
transcription factors (TFs) that are unique proteins bind the PRs. When the conditions 
hold, the gene is activated/inhibited for transcription depending on the rules of this specific 
gene. The products of these processes could be either the functional proteins or TFs (See 
Figure [3T2l for a simple scatching of the GR process [32]). More details about gene regulations 
can be found in References |15[ [3]. 

My main aim in this part of the thesis is to both topologically and dynamically investi- 
gate the yeast gene regulation by using the network tools introduced in Chapter [21 
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3.2 The Example at Hand: Saccharomyces Cerevisiae (Yeast) 

Saccharomyces Cerevisiae or the yeast is a unicellular eukaryotic microorganism. Its gene 
regulation data [6] was used in this thesis since it is one of the mostly studied organism. 
The data was retrieved from YEASTRACT3 database [37] When the data was taken it 
was including 4252 genes (146 of them are TFs) with 12541 interactions. 

3.2.1 Topology of the Yeast Gene Regulation Network 

My investigations for some conventional topological features of the yeast GRN can be seen 
in Figure [3731 and Figure [3741 Previously, Guelzim et al. have topologically investigated the 
yeast GRN and stated an exponential decay in the indegree distribution with an exponent 
A = 0.45 [13] . However, my indegree distribution was fitted to an exponential decay with 
an exponent A = 0.38 =p 0.01 by using GNUPLOT[f| with ignoring ki n = 0. The reason of 
this difference might be due to that my fitting was done by at first taking log of y-values 
and then fitting to a linear function. I also did a direct fitting resulting in A = 0.46 =p 0.01 
exponential decay which is is almost the same as Guelzim et aVs. Another detailed studies 
for topological examination of yeast can be found in References |32[ 15]. 

The yeast GR network includes 4252 nodes/genes with 12541 directed edges/interactions 
(average degree is 2.95). 146 of those genes are TFs and there are 403 interactions between 
TFs (average degree is 2.76). Dynamically relevant subnetwork of yeast GRN consists of 82 
TFs and 254 interactions (average degree is 3.10). 

Comparing the artificial networks, the fraction of dynamically relevant nodes to the 
number of nodes in yeast GRN is very low (82/4252 = 2%). As shown in Figure 12,141 the 
artificial model networks with the same indegree distributions show a fraction of 85 — 90%. 
Figures 13.31 and 13.41 also show the topological features of dynamically relevant subnetwork 
of the yeast GRN and one can state the topologies are similar. This specialties of yeast 
GRN might be crucial in its dynamics in real case. 

2 www.yeastract.com 
3 http:/ / www.gnuplot.info/ 
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3.2.2 Dynamics of the Yeast Gene Regulation Network 

Although the interacting pairs are known very well, the rules governs the interactions in 
the yeast gene regulation are not identified in detail yet. For this reason, I used the 4 types 
of random functions introduced in Chapter [21 simple random function (RF), canalazing 
random function (CF), nested canalazing random function (NCF) and special subclasses of 
nested canalazing random function (SNCF) with a statistical approach for investigations. 

In order to investigate and compare the attractors of Yeast GRN for all function types, 
one needs to fix the p value. SNCF for dynamically relevant subnetwork of actual Yeast GRN 
gives out an effective p- value p = 0.27=f0.05, therefore, I used this p value as the parameter 
for other type of functions, too. 2000 network realization were done for investigation. For 
each realization, attractors were explored by starting from 1000 random initial conditions. 
In the dynamics, the maximum 1000 steps and the maximum attractor length 200 limits 
were set. The distributions and averages of the number of attractors N a u r , the length of 
attractor L attr , transient T at t r and the entropy h at t T can be seen in Figure 13.51 and Table 13.11 

First of all, it should be noted that N a ttr and h a ur distribution are not decreasing for all 
functions, especially SNCF type shows a not ordinary profile. This type of the distribution 
was not observed while studying with model networks in Chapter [2] and should be discussed 
further. Secondly, the averages with SNCF type shows a significant difference with other 
types. As it is seen in Table I37T] it has a big (N at t r ) while having small (L at t r ) and {T a u r } 
which might be desirable in biological systems [25]. As a conclusion to the attractor results, 
SNCF type seems to be very appropriate for the dynamics of the yeast gene regulation and 
should be continued to be investigated. 

I also obtained the robustness for each p £ {0.00,0.01, ..,0.50} of RF,CF and NCF, and 
for p = 0.27 of SNCF. 10 dynamics realization and for each realization 1000 random initial 
conditions were created and as explained in chapter 2, dynamics were runned for 10 x N 
steps (N = 82 in this case) in order to be able to achieve an attractor. After these steps, 
robustness was measured numerically and averaged for all values. The results can be seen 
in Figure 13.61 

It is shown that for p = 0.27 (attractor statistics was done at this p-value) RF functions 
at the chaotic side near critical boundary while others are at the ordered side. Also, it 
seems that the Derrida's Exp l2.23l predicts RF results quite well. The results also shows 
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Figure 3.3: Yeast GR actual and dynamically relevant sub- network's a-) indegree proba- 
bility distribution b-) outdegree probability distribution, c-) total degree probability dis- 
tribution and d-) degree-degree correlation 





(Nattr) 


{Lattr} 


{Tattr) 


\h a ttr) 


RF 


430.78 =F 263.78 


8.82 =F 13.45 


18.33 T 15.86 


5.40 T 1.10 


CF 


222.77 =F 201.04 


3.77 T 3.12 


8.48 T 3.99 


4.48 =F 1.20 


NCF 


221.15 =F 209.14 


2.84 T 1.93 


6.68 =F 2.36 


4.45 =F 1.27 


SNCF 


538.72 T 212.18 


3.40 =F 2.77 


7.84 T 3.50 


5.96 T 0.61 



Table 3.1: Average values of the number of attr actors N a tt r , the length of attr actor L a tt r , 
transient T a u r and the entropy h a ttr of Yeast GRN for RF: Random Function, CF: Cana- 
lyzing Function, NCF: Nested Canalyzing Function, SNCF: Special Subclasses of Nested 
Canalyzing Function. For the details of the study, see the caption of Figure 13.91 
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Figure 3.4: Yeast GR actual and dynamically relevant sub- network's a-) richclub coefficient 
b-) clustering coefficient, c-) shorthest path probability distribution (Binned in 1.0 units) 
and d-) betweenness probability distribution (for actual network it is shown in big frame 
with binned in 100 units, for dynamically relevant subnetwork it is shown in small frame 
with binned in 10 units.). 
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Figure 3.5: Distribution of attractor features of Yeast for Random Function (RF), Can- 
alyzing Function (CF), Nested Canalyzing Function (NCF) and Special Subclasses of 
Nested Canalyzing Function (SNCF); a-) number of attractors N a u r probability distri- 
bution (Binned in 10 units) b-) length of attractor L a ttr probability distribution (Binned 
in 2 units), c-) transient to attractor T a ttr probability distribution (Binned in 10 units), 
d-) entropy h a ttr probability distribution (Binned in 0.2 units). Attractors were found with 
starting from 1000 initial conditions of each 2000 network realizations with maximum steps 
and maximum L a tt r limits as 1000 and 200, respectively. For RF, CF and NCF, p was fixed 
to 0.27 which is the p of SNCF for yeast. 
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Figure 3.6: Robustness of yeast GRN for all types of functions. For each p value, robustness 
was computed with averaging over 1000 random initial conditions of each 10 realization. 
Also, the Derrida's Exp l2.23"l s = 2p(l — p){ki n ) was drawn with (ki n ) of the dyn. rel. 
subnetwork, 3.1. 



that although SNCF and CF gives the same robustness value for the attractor investigation 
paramater p = 0.27, they could produce different attractor structures. In other words, there 
might be no direct relation between attractor and robustness structure. 
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3.3 in-EXP Model Networks for Yeast GR 

To my knowledge Kauffman is the first who used the model networks to elucidate the gene 
regulation [16]. He used in-NK networks_| for modeling the gene regulations. However, in 
order to compare the dynamics of the yeast, I used 100 in-EXP networks with this exponent 
since indegree of yeast GRN was an exponential decay with exponent 0.38. I set N = 94 
for the sake of achieving a similar number of nodes of the dynamically relevant subnetwork 
of yeast, N = 82 (See Figure [2.141 in Chapter [2] for fractions of dynamically relevant nodes 
to system size for model networks.). 

3.3.1 Topological Investigation 

Before passing to the dynamics, let me compare the topological features of this model net- 
works with yeast dynamically relevant subnetwork'. Figure [3771 and [3TH1 show the investiga- 
tions. It can be seen that out- and total-degree distributions and degree-degree correlations 
resemble that of yeast dynamically relevant subnetwork while others are different. 

3.3.2 Dynamical Investigation 

Similar to Yeast's case, the dynamics was investigated for p = 0.27. It should be noted that 
the p-value for SNCF is found to be p = 0.27 which is the same as Yeast's. In dynamics, 
100 networks, for each network 10 network realization and for each network realization 100 
initial conditions were created. Again maximum 1000 steps and 200 attractor length were 
set. The distributions and averages can be found in Figure [3791 and Table [3721 

Robustness was also studied for each p 6 {0.0, 0.02, .., 0.50} for RF, CF and NCF, andp = 
0.27 for SNCF. The robustness was computed by averaging over 100 networks, 10 dynamics 
realization for each network and 1000 random initial conditions for each realization. The 
results are in Figure [3.101 

The yeast and in-EXP model networks produce different attractors features while show 
similar robustness profiles. The main conclusion of that part is that the model networks 
which are constructed by knowing only indegree distributions and the network size fail for 
attractor features predictions while win for robustness. 



4 K = 2 case is also known as Kauffman networks in gene regulation literature 
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Figure 3.7: in-EXP Model Yeast Network's a-) indegree probability distribution b-) outde- 
gree probability distribution, c-) total degree probability distribution and d-) degree-degree 
correlation 





(N a ttr) 


(Lattr) 




(hattr) 


RF 


5.15 =f 5.17 


68.29 T 216.64 


101.73 =F 221.50 


0.93 T 0.70 


CF 


2.87 T 3.50 


4.31 T 32.04 


11.83 T 32.34 


0.53 T 0.63 


NCF 


1.90 =F 1.88 


2.07 T 2.37 


7.06 =F 3.06 


0.31 T 0.52 


SNCF 


3.94 qp 5.65 


3.70^4.56 


11.99 =F 6.55 


0.69 T 0.79 



Table 3.2: Average values of attractor features of in-EXP Model Yeast GRN. RF: Random 
Function, CF: Canalyzing Function, NCF: Nested Canalyzing Function, SNCF: Special 
Subclasses of Nested Canalyzing Function. For the details of the study, see the caption of 
Figure [3J3 
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Figure 3.8: in-EXP Model Yeast Network's a-) richclub coefficient b-) clustering coeffi- 
cient, c-) shorthest path probability distribution (Binned in 1.0 units) and d-) betwenness 
probability distribution (Binned in 10 units) 
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Figure 3.9: Attractor investigation of in-EXP Model Yeast Networks for Random Function 
(RF), Canalyzing Function (CF), Nested Canalyzing Function (NCF) and Special Subclasses 
of Nested Canalyzing Function (SNCF); a-) Number of Attractors Distribution (binned in 
2 units), b-) Length of Attractors Distribution (binned in 2 units) c-) Transient time 
distribution (binned in 10 units), d-) Entropy distribution (binned in 0.1 units). Attractor 
were found by starting from 100 initial conditions for each realizations. 10 realizations were 
done for each of 100 networks. The limits for maximum step size and length of attractors 
were 1000 and 200, respectively, p of the functions were 0.27 
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Figure 3.10: Robustness Comparison of Actual Yeast and in-EXP Model Networks, a-) 
Simple Random Func. b-) Canalyzing Func. c-) Nested Canal. Func. The robustness was 
computed by averaging over 100 networks, 10 dynamics realization for each network and 
1000 random initial conditions for each realization. 
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3.4 A Model: Root of the Yeast Gene Regulation Network Topology 

Starting from some previous studies [38, 39J, Balcan et al. have arrived at a novel model 
produces the complex networks whose topological properties resemble that of the yeast gene 
regulation network |32j . 

3.4-1 Description of Model 

The model is initiated with a starting fixed number of genes. Next, each gene is assigned to 
be Transcription Factor(TF) coding gene with a probability p. The numbers are opti- 
mized as 6000 genes at the beginning and p = 200/6000 for determining TFs by consulting 
available actual Yeast data [37] . 

The model then assigns two types of random binary sequences, i.e. 110100... The former 
is to all genes being called Prometer Sequence, PS and the latter is to only TF coding 
ones being called Regulatory Sequence, RS. Thus, TF coding genes should have two 
labels where the others have only one. 

o.rsr 
o. io - 

o.os - 

o.oo L 
o 

Figure 3.11: Regulatory sequence distribution of yeast [32] 

The most important part in assigning these random binary sequences is determining the 
lengths of the sequence attached to genes. The RS lengths are taken from a distribution 
yielded from a study of Harbison [30] as shown in Figure [3.111 However, there is no available 
distribution for the PR lengths; therefore, authors have assumed that the distribution obeys 
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the power-law with P(l) ~ l~ , This assumption was based on the investigation of the 
intergenic portions of the DNA and the result of P(l) ~ l^ 1 ^^ where fi is applicable in 
0.0 < fj, < 2.0 |41j . Morover, again referring the Harbison study [40] which argues that the 
most of the likelyhood for determining a TF binding site occurs in a 250 bps window on the 
DNA, they bounded I with the expression Imax — Irnin -)- 1 — 250 where Imin is taken to be 
pick-value of RS distribution. 

After assigning the sequences, a directed edge is placed from TF coding gene (RS) to 
gene (PS), if and only if a RS is fully inside a PS. These processes can be runned several 
times and an ensemble can be created. 

3.4-2 Topological Investigation (Reproduction of Some Results) 

Balcan et al. represented in their study that their model creates networks whose topological 
properties resembles to the actual yeast GRN. I reproduced some of the investigations found 
in the article [32] as shown in Figure 15. 121 in order to be able to elucidate the model. Apart 
from these topology features, they also stated a similarity in K-core structure. In general, 
the model seems to be very successful for producing the similar topologies. Only topological 
dissimilarity they found is a detail difference in motifs. 

I also found a detail mistake in the article which can empower the results. The average 
number of TFs of the model networks were stated as 202 =p 14 in the article, however, 
according to my computations it was 167 =F 14 which is closer to actual number of TFs in 
the yeast: 146. 

3.4-3 Dynamical Investigation of Model 

My aim at this part of the thesis is to investigate the Balcan et al. model networks in order 
to detect whether they are also similar with respect to dynamics as they do w.r.t. topology. 

For the dynamical investigation 100 Balcan et al. model networks were used. As previous 
cases, investigations were done with using dynamically relevant subnetworks which are found 
to have 36 =p 15 nodes. Comparing to the actual yeast case (N = 82), the dynamically 
relevant nodes are very less in number. Next, both the attractors and robustness were 
calculated by starting from 100 initial conditions for each of 10 network realization for 
each network. For finding attractor, the limits for the maximum length of attractor and 
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Figure 3.12: Some reproductions of Balcan et al. models' results; a-) indegree prob. dist., b- 
)outdegree prob. dist., c-)total-degree prob. dist., d-)degree-degree correlation, e-)clustering 
coefficient, f-)richclub coefficent. x- and y-values of the data of corresponding network were 
multiplied/divided by the average degree. 

maximum step size were set to 200 and 1000, respectively. The p value for RF,CF and 
NCF were set to p = 0.27 while p of SNCF was found to be 0.30. The results for attractor 
features are shown in Table [3731 and Figure l3.131 The results for robustness is in Figure l3.14i 
As a result, Balcan et al. model networks failed to mimic the yeast. The results for the 
average and distributions resemble in-EXP yeast model except that the attractor averages 
for random functions are considerably bigger. Also, the robustness values for Balcan et al. 
model networks seems to be successful for NCF and SNCF whereas do not give appropriate 
results for RF and CF. I consider that weak part of the model is that it does not produce 
a network having as much dense dynamical core as the yeast. This could be related to 
dissimilarities they stated for the motifs. Balcan et al. model may be enhanced with 
considering the dynamcally relevant subnetwork procedure used in this thesis. 
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Figure 3.13: Probability distributions for attractor features of Balcan et al. Model Net- 
works. RF: Random Function, CF: Canalyzing Function, NCF: Nested Canalyzing Func- 
tion, SNCF: Special Subclasses of Nested Canalyzing Function. 100 model Balcan et al. 
networks and for each network, 10 realizations were created. Attractors were found by 
starting from 100 initials conditions for each realization where p = 0.27 was set for the 
functions. 
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{Naur) 


(Lattr) 


(Tattr) 


(hattr) 


RF 


3.14 =F 3.51 


4.28 =F 8.48 


10.31 =F 12.28 


0.57 =F 0.62 


CF 


2.09 =F 2.54 


2.04 =F 2.16 


5.47 =F 3.28 


0.35 T 0.52 


NCF 


1.44 =f 0.77 


1.47 =f 1.01 


4.68 =F 2.03 


0.21 =F 0.36 


SNCF 


4.36 T 7.95 


3.30 =F 3.85 


9.02 =F 5.07 


0.74 =F 0.78 



Table 3.3: Average values of attractor features of Balcan et al. Model Networks. RF: 
Random Function, CF: Canalyzing Function, NCF: Nested Canalyzing Function, SNCF: 
Snecial Suhclases nf Nested Ca.nalvzinff Function. Fnr details, see the ca/ntions of Fienire |3.1 3[ 




Figure 3.14: Robustness investigation of Model Yeast GRN with comparison to Actual 
Yeast; a-), b-), c-). Also it should be noted that for Special subclasses of Nested Canalyzing 
Functions, robustness measure is 0.83 =f 0.08 where for Yeast it is 0.78. 
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Chapter 4 
PROTEIN FOLDING 

This chapter is organized as follows: Section T4. 1 1 summarizes the protein folding problem 
in biology. Section [4,21 introduces a new approach to the problem by networks. Section [4.31 
gives the investigations on protein Serine Proteinase Inhibitor. Section [4.41 emphasizes the 
results for other proteins. 

4.1 Introduction 

Living organisms consist of five types of organic compounds: carbohydrates, lipids, nucleic 
acids, vitamins and proteins. Among these organic compounds, the protein has a privileged 
place due to its functionality, in the cells [32]. Because of this importance, proteins have 
been studied highly for decades. 




Figure 4.1: The structure of protein Serine Proteinase Inhibitor (PDB ID: 2CI2) 

A protein is a chain of aminoacids which are bound to the neighbors in the chain by 
covalent interactions. The natural amino acids are 20 types and the number of amino acids 
in a protein can vary from 20 to 3000. The amino acid sequence is determined by responsible 
gene on the DNA. The DNA sends the necessary information to ribosomes via mRNA and 
they use mRNA to produce the proteins. After this process in ribosomes, this linear chain 
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immediately folds and becomes more compact in 3D shape named as the native structure. 
This structure is crucial for the functionality of proteins in the cell. The sequence of the 
amino acids in a protein is unique for this protein and is called the primary structure of 
the protein [2] . Both the primary anc . native structures of proteins are available in a 



free-database: Protein Data Bank (PDB]^. 
4- 1.1 The Folding Problem 

Since the native structure of a protein mainly affects its functionality in the cell, elucidating 
the mechanism from primary structure to native structure has been one of the leading tasks 
in protein studies [H]. Shortly, the protein folding problem is about finding out the 
"native" structure of protein from known and unique primary sequence (Figure 14. 2p . A 
more detailed review regarding to the protein folding can be found in Reference [44] . 



Aminoacid sequence 




Native 1 true tun 



Figure 4.2: The Protein Folding Problem 



4-1.2 The Quantifier for Protein Folding 

(ft- value, (ftii To my knowledge, (ft- value is the only experimental analysis of two-state folding 
proteins [Sj. Two-state folding means that the folding occurs at first from unfolding state 
(U) to transition state (TS) and next, from transition state to folded state (F). In this 
analysis a particular aminoacid i is mutated to another aminoacid type which is generally 
Alanine [44J. Later, (fti is determined by using the experimental Gibbs-Free energies of 
mutant and wild-type protein according to Eq. 14.11 [45j: 



a ^wild-type _ \ rmu tant 
A ^wild-type a /-'mutant ' 



(4.1) 



1 http:/ /www. pdb.org 
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4.2 A New Approach to the Protein Folding 

Before getting into new approach, let me define the network of a protein, G(P). Each 
aminoacid of the protein is a node of G(P) and an edge is assigned to each pair of nodes in 
G(P) if and only if the real distance between aminoacids in native structure of the protein 
is less and equal to a certain threshold distance, rthr- One should be aware that G(P) is 
also named as contact network (map) in literature |46} 137] 



Consulting the literature |47l [46] and after some trials, I fixed the threshold distance as 
r thr = 6.5 A. Also, I prohibited the edges between the aminoacid i and j if \i — j\ < 3. 

New approach starts with a definition of another network from G(P) which is named as 
incompatibility network: IG(P) [35]. For definition of IG(P), let me refer to Figure [331 
In this figure, a protein is extended like a linear chain and starting from Vi and finishing at 
Vk a half circle is drawn if an edge exists between Vi and Vk in G(P). In incompatibility 
network IG(P); e-ik in G(P) is defined as a node Vik and an edge between vn~ and Vji is 
attached if their half circles in the figure are crossed [48J . 

Such incompatibility networks have been studied as a tool for understanding mRNA 
[49] functionality [48J such as determining important regions of mRNA, etc. [50 [ 151 4 152 ] 153] . 
Here my aim was to apply this concept to proteins. 





Figure 4.3: Definition of incompatibility networks 



Chapter ^: Protein Folding 



63 




Figure 4.4: Network of protein 2CI2, i.e. G(CI2) 

4.3 Example at hand: Serine Proteinase Inhibitor CI-2 (PDB ID:2CI2) 

Serine Proteinase Inhibitor (PDB ID: 2CI2) [53] is a simple two-state folding protein with 
65 aminoacids. It has been highly studied in literature and this is the main reason to be 
chosen in this thesis. A ribbon structure of protein 2CI2 is shown in Figure I4TT1 

(ft- value analysis of 2CI2 can be found in Reference |45j . However, I used an extended 
value data of 2CI2 (Figure 14.61 b ) and some other proteins (in Appendix [C]) found by a 
private communication with Prof. Michele Vendruscolo. 

G and IG of 2CI2 are constructed as explained above. G(2CI2) can be seen in Figure ET4"1 
However, since the figure of IG(2CI2) was not clear due to high number of edges and nodes, 
it was not inserted here. 

The degree probability distributions of G(2CI2) and IG(2CI2) are given in Figure [431 
Other topological features related to nodes are explored for the sake of seeking a correlation 
to corresponding (^-values (Figure [4.61 and Figure |4~7F|) . When a similarity is seen by visual 
inspection, a statistical analysis is done in order to understand possible correlation. For the 
explanation of the statistical analysis, let me say x^s are the candidates for being signals to 
^-values. I calculated the original root mean square deviation (RMSD) by VEi (<Pi ~ x i) 2 - 
Later, many times I shuffled the (pi to have new ^-values sets and each time I calculated 
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• G(2CI2) 

* IG(2CI2) 



Figure 4.5: Degree probability distribution of normal (G) and incompatible (IG) network 
of protein 2CI2. 



another RMSD with new set of 4>i which gave me a distribution of RMSD at the end. If the 
original RMSD was in the first deviation part of this distribution, XjS lost its candidateship 
for a signal. With this method, it was concluded that no correlation between topological 
features and ^-values of 2CI2 exists. 

4.4 Other Proteins 

Apart from 2CI2, other proteins (2PTL, 1SHF, 1TEN and 1APS) are also studied, however, 
no correlation to ^-values was seen for the limited data that is available. The Rvalue data 
of the proteins retrieved are given in Appendix [Cl 
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Figure 4.6: The comparison of ^-values and topological features of normal network for 2CI2, 
G(2CI2); a-) degree of node i, b-) ^-values of 2CI2, o) node i's average shortest distance 
to other nodes, d-) betweenness of node i 



Chapter 4- Protein Folding 66 



600 
500 
400 
M- 300 
200 
100 


40 



iiiiiiiii|iiiiiiiii|iiiiiiiii|iiiiiiiii|iiiiiiiii|iiiiiiiii|iiiiiiin 

a- 



B I I ail I I 



10 20 30 40 50 
Aminoacid i 



60 70 



30 



& 20 



10 




lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

• c- 



-• • 



i i i i i i 



1.5 



0.5 







i|iiiiiiiii|iiiiiiiii|iiiiiiiii|iiiiiiiii|iiiiiiiii|iii n iiii 
phi values of 2CI2 



? 



ii i'i 



^""1 1 1" 



iiiiiiiiliiiiiiiiiliiiiiiiiilWiiiii 



10 20 30 40 50 60 70 
Aminoacid i 



5000 



4000 - • • 



3000 
2000 
1000 




lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 



I I I ffij I I 



10 20 30 40 50 60 70 
Aminoacid i 



10 20 30 40 50 60 70 
Aminoacid i 



Figure 4.7: The comparison of ^-values and topological features of incompatibility network 
for 2CI2, IG(2CI2); a-) degree of node i, b-) ^-values of 2CI2, c-) node i's average shortest 
distance to other nodes, d-) betweenness of node i 
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Chapter 5 

CONCLUSIONS & FURTHER RESEARCH 

Gene regulations are an important functional organization of the cells. Activation of the 
genes in eukaryotes has a dynamical structures. Because of the complexity of the systems, 
rough but powerful models: networks are used for both topological and dynamical inves- 
tigations of the gene regulation. In particular, the boolean networks with synchronously 
updates are studied as generic models for the dynamical investigations. Although the in- 
dividuals with interactions in the gene regulations are well established in literature, the 
functions that govern the activation of a gene are not fully understood. For this reason, 
different types of random functions are used in the studies, i.e. simple random, canalyzing, 
nested canalyzing and special subclasses of nested canalyzing functions. 

Having fixed the network structures and the functions, the boolean dynamics possesses 
the state cycles called attractors which is the main feature of the dynamics to quantify. 
In particular, the number of attractors, the length of the attractors, the transient length 
to attractors and basin of attractions are studied for understanding the boolean dynamics. 
It has been argued that the attractors correspond to some cell cycles in living organisms. 
Moreover, the robustness of a system is studied as an important property of biological 
system as "Life at the edge of chaos" hypothesis argues. 

In this thesis, the model networks were introduced and investigated both in topology 
and dynamics. It has been shown that the fraction of dynamically relevant nodes to system 
size only depends on the average indegree (ki n ) not on the topology type. The attractor 
features were explored for the network realization parameters {ki n ) = 2.0 and p = 0.5 which 
are discussed widely in literature. The distributions show power law decays for model net- 
works. Average values of simple random functions are considerably bigger than canalyzing 
and nested canalyzing functions. Another important investigation was the scaling of the 
attractor features with the system size. I have shown that (N attr ), (L attr ) and {T a ttr) of 
in-NK networks (50 < N < 1000, = 2.0)) and simple random functions(p = 0.5) scale 
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with jV 0,53 , iV 0,87 and iV L04 , respectively. The result for the number of the attractor ver- 
ifies y/N theorem while refutes for the length of the attractors. Also, jV 0,53 scaling result 
refutes the recent study of Kauffman which argues that the scaling of the mean number of 
attractors is faster than linear. Apart from attractor studies, the robustness of the model 
networks was studied. It is shown that for all types of the topologies with simple random 
function, Derrida's expression s = 2p(l — p){ki n ) is valid with a finite-size effect. 

The yeast gene regulation network was investigated dynamically with a boolean ap- 
proach. For the sake of comparison in attractor features for all function types, p was set to 
0.27 which is the output p value for the special subclasses of the nested canalyzing functions 
(SNCF) for the yeast GRN. The results show that SNCF are successful for the optimization 
of maximising the number of attractors and minimising the attractor lengths and transients 
which might be a desirable property for a biological system. Also, the distributions of the 
attractor features were observed with a nontypical distribution behavior for the number of 
attractors and the entropy. These distributions are not decreasing for all function types, 
especially for SNCF. As an important contribution, it was seen that SNCF type may be 
crucial to elucidate the actual dynamics of the yeast gene regulation. Moreover, the yeast 
GRN was compared with model networks whose indegree distributions decay similarly. The 
results show that yeast's attractor distributions and averages are not like model networks. 
But, it is observed that the robustness structures are similar. As a conclusion, it is seen 
that to know only the indegree distribution is not enough to produce attractor features 
of the yeast GRN while being very successful for the robustness behavior. As an another 
contribution, a recent model: Balcan et al. model was discussed and it has been shown 
that it produces the similar topological networks of the yeast. However, dynamical analyses 
of this model networks established that the model is not successful at producing neither 
the attractor nor robustness structures. The main reason of this might be due to that Bal- 
can et al. model networks have low number of dynamically relevant nodes (N = 36 =F 15) 
comparing to actual yeast's (N = 82). 

At the last part of the thesis, a relation between the protein folding kinetics and a new 
network approach, incompatibility networks was asked. I have shown that no relation exists 
between some topological tools and known ^-values for the limited data of some proteins. 
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Further Research: Mean attractor features were studied and the results contradict with 
Kauffman et al. An extended study which includes also SNCF and other p- values can be 
done a further work. 

In the Appendix [Bj I discussed some finding attractor algorithms where I pointed a 
novel algorithm. I believe this algorithm would enhance the related research considerably 
and should be applied with a low-level programming language such as C++. 

The SNCF type has been shown to give different results for the yeast GRN. It should be 
noted that this type of function is also very time-efficient since it uses a logical formalism 
(AND and OR functions). All these make me consider that only SNCF type can be used 
for a further the yeast attractor investigation. 

I also saw the success of the Balcan et al. model for producing the similar topologies. I 
believe that this model can be enhanced for the dynamical successes also with a consideration 
of the dynamically relevant subnetworks used in this thesis and motifs in the literature. 

Apart from the gene regulation part, I consider that the study I have done should be 
repeated when there is richer </>-value data of the other proteins. 
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Appendix A 

ANALYTICAL EXPRESSION FOR (K IN ) 

For in-NK networks we have exactly ki n edges going into each node, therefore, (ki n ) = 
ki n . However, for in-EXP and in-PL networks k{ n values vary for different nodes. The ap- 
proximate analytical calculation of (k{ n ) follows from substituting the sum with an integral: 

in-PL: 



1 = /.ml,, A(a)k~ a dk (Normalization cond. 



(A.l) 



A(a) 



l-a 



in-EXP: 



j^max 

) = QL A(a)kk~ a dk 



(Umax 2 — a\ ( umin^ "\ 

A(a)&» '- 



umax 

1 = J.^Sn B(X)e~ dk (Normalization cond.) 

in 

e in — e m 

(kin) = itEu B(X)ke' Xk dk 



i , max . \ , min 

B (\) {k ^ aXe tn )-( fc r"e Afc "* ) + 1 



(A.2) 



(A.3) 



(A.4) 



However, since in real case we have quantized k values, i.e. k=l,2,3,..,N. (fcj n ) deviates 
from the Eqs. IA.2l and lA.4[ Correspondence of exponents of PL and EXP networks to {ki n ) 
for both analytical and actual cases is presented in Figure 12.41 This figure also discuss the 
main reason of this deviation. 
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Appendix B 

FINDING ATTRACTOR ALGORITHMS 

Finding all the attractors of a large network is a challenging task. There are mainly two 
types algorithms for finding the attractors: exhaustive and heuristic. 

An exhaustive kind algorithm is desired solution which finds all attractors. One can 
consider two types of exhaustive algorithm. First one is the straightforward method which 
starts from each initial network states and finds the attractors. However, since the numbers 
of network states (= 2^) grows exponentially with N, it is computationally infeasible with 
the available computers. After some trials I concluded this exact algorithm fails for N > 18. 
Second type is a novel algorithm which I have seen during surveying the literature and it 
claims to find all attractors of N around 100 [55]. Its main idea is to go back from the 
partial network states (a state description includes only 2—, 3—, etc. node sates) and to try 
to clarify which partial states are impossible to be found in any attractors of that network 
realization. I have scripted this algorithm with using python_| [56J although I did not get 
the efficiency stated in the paper [55], mainly because of using a higher level script language 
rather than a low level programming language such as C++. Yet, it was more powerful 
than straightforward method approximately for 17 < N < 60. 

Second approach is the heuristic algorithms which sample from the random initial con- 
ditions and finds the algorithms. Since the biological networks in the dynamics of interest 
in this thesis are of size N = 85 or more I left the exhaustive algorithms and implemented 
a heuristic algorithm. 



1 Available at http://www.python.org 
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Appendix C 
0- VALUES OF SOME PROTEINS 

^-values which are found by private communication with Prof. Michele Vendruscolo are 
given in Table EJ Table O and Table 



lbf4 




lbk2 




lshf2 




res. i 


<t>i 


res. i 


<t>% 


res. i 


<t>i 


3 


0.01 


3 


0.16 


4 


0.28 


14 





6 





6 


0.18 


16 





18 


0.32 


18 


0.06 


26 


1 


19 


0.29 


20 


0.22 


29 


0.44 


24 


0.22 


24 


0.41 


30 





31 


0.25 


26 


0.15 


31 


0.43 


38 


0.26 


28 


0.71 


34 


0.3 


39 


0.48 


39 


0.86 


36 


0.25 


41 


1 


41 


1 


40 


0.22 


47 


0.58 


44 


0.74 


42 


0.21 


48 


0.61 


50 


0.37 


44 


0.59 


50 


0.53 


55 


0.01 


45 


0.09 


53 


0.16 






50 


0.22 










54 


0.21 










55 


0.27 










58 


0.6 











Table C.l: 0-values of lbf4, lbk2 and lshf2. 
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2ci2 




2ci2 




2ptl 




2ptl 




laps 




res. i 


4>i 


res. i 


4>i 


res. i 


4>i 


res. i 


4>i 


res. i 


4>i 


3 





42 


0.1 


4 


0.58 


31 


0.4 


11 


0.93 


4 


0.05 


43 


0.1 


5 


0.27 


32 


0.21 


13 


0.37 


7 


0.1 


44 


0.1 


6 


0.41 


33 


0.3 


17 


0.1 


8 


0.4 


48 


0.2 


7 


0.64 


34 


0.09 


20 


0.18 


9 


0.2 


50 


0.5 


8 


0.56 


35 


0.29 


22 


0.09 


13 


0.35 


51 


0.26 


9 


0.14 


36 


0.29 


29 


0.15 


15 


0.7 


52 


0.2 


10 


0.41 


37 


0.12 


30 


0.42 


16 


0.4 


53 


0.1 


11 


0.59 


38 





36 


0.22 


17 


1.1 


57 


0.15 


12 


0.18 


40 


0.16 


39 


0.14 


18 


0.35 


58 


0.1 


13 


0.55 


44 





42 


0.37 


19 


0.7 


59 


0.1 


14 


0.79 


45 





45 


0.58 


21 


0.4 


61 


0.15 


15 


0.68 


48 


0.26 


47 


0.54 


22 


0.35 


62 


0.05 


17 


0.4 


49 


0.33 


51 


0.39 


23 


0.3 


64 


0.03 


19 


0.25 


51 


0.24 


54 


0.98 


25 


0.2 






20 


0.59 


52 





61 


0.21 


26 


0.2 






21 


0.85 


55 


0.12 


64 


0.34 


27 


0.4 






22 


0.5 


56 


0.25 


65 


0.27 


30 


0.25 






23 


0.45 


57 


0.14 


71 


0.09 


33 


0.1 






24 


0.3 


58 


0.28 


75 


0.02 


35 


0.15 






25 


0.45 


59 


0.17 


78 


0.02 


37 


0.2 






26 


0.8 


60 


0.19 


83 


0.04 


39 


0.1 






29 


0.27 


61 


0.09 


86 





40 


0.3 






30 


0.08 


62 





89 
94 


0.07 
0.76 



Table C.2: <p- values of 2ci2, 



2ptl and laps. 
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lten 




lfmk 




limq 




res. i 


4>i 


res. i 


4>i 


res. i 


4>i 


1 


0.04 


1 





7 


0.15 


4 


0.04 


2 


0.1 


13 


0.98 


7 


0.1 


3 


0.03 


15 


0.33 


9 


0.23 


4 


0.05 


16 


0.52 


17 


0.14 


5 





18 


0.4 


19 


0.39 


8 


0.03 


19 


0.32 


28 


0.13 


10 


0.28 


22 


0.31 


31 


0.19 


15 


0.13 


27 


0.12 


33 


0.35 


16 


0.26 


33 


0.27 


35 


0.53 


17 


0.03 


36 


0.25 


47 


0.67 


18 


0.4 


37 


0.15 


49 


0.42 


22 


0.62 


40 


0.01 


56 


0.38 


24 


0.55 


52 


0.03 


58 


0.6 


27 


0.77 


53 


0.07 


61 


0.33 


34 


0.25 


67 


0.41 


63 


0.47 


35 


0.15 


68 


0.23 


65 


0.25 


36 


0.54 


71 


0.36 


67 


0.42 


37 


1 


76 


0.37 


69 


0.54 


38 


0.08 


77 


0.37 


71 


0.29 


39 


0.95 


83 


0.31 


76 


0.21 


40 


0.72 






80 


0.03 


42 


0.86 






83 


0.21 


45 


0.68 






85 


0.08 


47 


0.56 






87 


0.11 


48 


0.71 






89 


0.11 


49 
53 


0.24 








Table C 



.3: values of 



lten, lfmk and limq. 
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